We focus on causal discovery in the presence of measurement error in linear systems where the mixing matrix, i.e., the matrix indicating the independent exogenous noise terms pertaining to the observed variables, is identified up to permutation and scaling of the columns. We demonstrate a somewhat surprising connection between this problem and causal discovery in the presence of unobserved parentless causes, in the sense that there is a mapping, given by the mixing matrix, between the underlying models to be inferred in these problems. Consequently, any identifiability result based on the mixing matrix for one model translates to an identifiability result for the other model. We characterize to what extent the causal models can be identified under a two-part faithfulness assumption. Under only the first part of the assumption (corresponding to the conventional definition of faithfulness), the structure can be learned up to the causal ordering among an ordered grouping of the variables but not all the edges across the groups can be identified. We further show that if both parts of the faithfulness assumption are imposed, the structure can be learned up to a more refined ordered grouping. As a result of this refinement, for the latent variable model with unobserved parentless causes, the structure can be identified. Based on our theoretical results, we propose causal structure learning methods for both models, and evaluate their performance on synthetic data.
translated by 谷歌翻译
Linear structural causal models (SCMs)-- in which each observed variable is generated by a subset of the other observed variables as well as a subset of the exogenous sources-- are pervasive in causal inference and casual discovery. However, for the task of causal discovery, existing work almost exclusively focus on the submodel where each observed variable is associated with a distinct source with non-zero variance. This results in the restriction that no observed variable can deterministically depend on other observed variables or latent confounders. In this paper, we extend the results on structure learning by focusing on a subclass of linear SCMs which do not have this property, i.e., models in which observed variables can be causally affected by any subset of the sources, and are allowed to be a deterministic function of other observed variables or latent confounders. This allows for a more realistic modeling of influence or information propagation in systems. We focus on the task of causal discovery form observational data generated from a member of this subclass. We derive a set of necessary and sufficient conditions for unique identifiability of the causal structure. To the best of our knowledge, this is the first work that gives identifiability results for causal discovery under both latent confounding and deterministic relationships. Further, we propose an algorithm for recovering the underlying causal structure when the aforementioned conditions are satisfied. We validate our theoretical results both on synthetic and real datasets.
translated by 谷歌翻译
This paper studies how to flexibly integrate reconstructed 3D models into practical 3D modeling pipelines such as 3D scene creation and rendering. Due to the technical difficulty, one can only obtain rough 3D models (R3DMs) for most real objects using existing 3D reconstruction techniques. As a result, physically-based rendering (PBR) would render low-quality images or videos for scenes that are constructed by R3DMs. One promising solution would be representing real-world objects as Neural Fields such as NeRFs, which are able to generate photo-realistic renderings of an object under desired viewpoints. However, a drawback is that the synthesized views through Neural Fields Rendering (NFR) cannot reflect the simulated lighting details on R3DMs in PBR pipelines, especially when object interactions in the 3D scene creation cause local shadows. To solve this dilemma, we propose a lighting transfer network (LighTNet) to bridge NFR and PBR, such that they can benefit from each other. LighTNet reasons about a simplified image composition model, remedies the uneven surface issue caused by R3DMs, and is empowered by several perceptual-motivated constraints and a new Lab angle loss which enhances the contrast between lighting strength and colors. Comparisons demonstrate that LighTNet is superior in synthesizing impressive lighting, and is promising in pushing NFR further in practical 3D modeling workflows. Project page: https://3d-front-future.github.io/LighTNet .
translated by 谷歌翻译
双重编码器结构成功地利用了两个特定语言的编码器(LSE)进行代码转换语音识别。由于LSE由两个预训练的语言特定模型(LSM)初始化,因此双编码器结构可以利用足够的单语言数据并捕获单个语言属性。但是,现有方法对LSE的语言没有限制,并且不足以针对LSM的语言知识。在本文中,我们提出了一种特定语言的特征辅助(LSCA)方法来减轻上述问题。具体来说,在培训期间,我们引入了两种特定语言的损失作为语言限制,并为其生成相应的语言目标。在解码过程中,我们通过组合两个LSM和混合模型的输出概率来考虑LSM的解码能力,以获得最终预测。实验表明,LSCA的训练或解码方法可以改善模型的性能。此外,通过组合LSCA的训练和解码方法,最佳结果可以在代码切换测试集上获得多达15.4%的相对误差。此外,该系统可以通过使用我们的方法来很好地处理代码转换语音识别任务,而无需额外的共享参数,甚至可以基于两个预训练的LSM进行重新训练。
translated by 谷歌翻译
视觉变换器将每个图像分成具有固定长度的令牌序列,并以与自然语言处理中的单词相同的方式处理令牌。更多令牌通​​常会导致更好的性能,但计算成本显着增加。通过谚语“一张图片胜过千言万语”,我们的目标是通过制造长图像短而加速VIT模型。为此,我们提出了一种新颖的方法在推论期间自适应地分配令牌长度。具体而言,我们首先培养一种含有可调整化 - vit(Revit)的Vit模型,可以处理任何具有不同令牌长度的给定输入。然后,我们从Revit检索“令牌长度标签”,并使用它培训轻量级令牌长度分配(TLA)。令牌长度标签是最小的令牌,以分割Revit可以使REVIT可以进行正确的预测,并且学习TLA以基于这些标签分配最佳令牌长度。 TLA使REVIT能够在推理期间使用最小足够数量的令牌处理图像。因此,通过减少VIT模型中的令牌数字来提高推广速度。我们的方法是一般的,与现代视觉变压器架构兼容,可以显着减少计算扩展。我们在两个任务中验证了我们对多个代表性VIT模型(DEIT,LV-VIT和TIMESFRER)的效果(图像分类和动作识别)。
translated by 谷歌翻译
这项工作调查了神经架构搜索中的批量标准化(NAS)。具体来说,Frankle等人。发现培训Batchnorm只能实现非竞争性能。此外,陈等人。声称培训Batchnorm只能加快10次单次NAS超网关的培训。批判性地,没有努力理解1)为什么训练Batchnorm只能找到具有减少的超空网训练时间的表演井架构,而且2)列车-BN的超网和标准列车超空网之间有什么区别。我们首先显示列车-BN网络融合到神经切线内核制度,从理论上获得与所有参数的所有参数相同的训练动态。我们的证据支持索赔仅在超培训时间上训练Batchnorm。然后,我们经验披露了培训-BN的超标网络在其他运营商的卷曲中提供了优势,导致架构之间的不公平竞争。这是因为只有卷积运算符被附加到Batchnorm。通过实验,我们表明这种不公平性使得搜索算法容易选择具有卷积的模型。为了解决这个问题,我们通过在每个操作员上放置批处理层来引入搜索空间的公平性。然而,我们观察到Chen等人的性能预测因子。在新的搜索空间上不可应用。为此,我们提出了一种新颖的综合性能指标,从三个视角评估网络:源自Batchnorm的理论属性的表达性,培训和不确定性。我们展示了我们对多NAS基准的方法(NAS-BENCH101,NAS-BENCH-201)和搜索空间(飞镖搜索空间和MOBILENET搜索空间)的有效性。
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
Blind image quality assessment (BIQA) remains challenging due to the diversity of distortion and image content variation, which complicate the distortion patterns crossing different scales and aggravate the difficulty of the regression problem for BIQA. However, existing BIQA methods often fail to consider multi-scale distortion patterns and image content, and little research has been done on learning strategies to make the regression model produce better performance. In this paper, we propose a simple yet effective Progressive Multi-Task Image Quality Assessment (PMT-IQA) model, which contains a multi-scale feature extraction module (MS) and a progressive multi-task learning module (PMT), to help the model learn complex distortion patterns and better optimize the regression issue to align with the law of human learning process from easy to hard. To verify the effectiveness of the proposed PMT-IQA model, we conduct experiments on four widely used public datasets, and the experimental results indicate that the performance of PMT-IQA is superior to the comparison approaches, and both MS and PMT modules improve the model's performance.
translated by 谷歌翻译
The development of social media user stance detection and bot detection methods rely heavily on large-scale and high-quality benchmarks. However, in addition to low annotation quality, existing benchmarks generally have incomplete user relationships, suppressing graph-based account detection research. To address these issues, we propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection. To our knowledge, MGTAB was built based on the largest original data in the field, with over 1.55 million users and 130 million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. In MGTAB, we extracted the 20 user property features with the greatest information gain and user tweet features as the user features. In addition, we performed a thorough evaluation of MGTAB and other public datasets. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations. By analyzing experiment results, we identify effective approaches for account detection and provide potential future research directions in this field. Our benchmark and standardized evaluation procedures are freely available at: https://github.com/GraphDetec/MGTAB.
translated by 谷歌翻译
Given the increasingly intricate forms of partial differential equations (PDEs) in physics and related fields, computationally solving PDEs without analytic solutions inevitably suffers from the trade-off between accuracy and efficiency. Recent advances in neural operators, a kind of mesh-independent neural-network-based PDE solvers, have suggested the dawn of overcoming this challenge. In this emerging direction, Koopman neural operator (KNO) is a representative demonstration and outperforms other state-of-the-art alternatives in terms of accuracy and efficiency. Here we present KoopmanLab, a self-contained and user-friendly PyTorch module of the Koopman neural operator family for solving partial differential equations. Beyond the original version of KNO, we develop multiple new variants of KNO based on different neural network architectures to improve the general applicability of our module. These variants are validated by mesh-independent and long-term prediction experiments implemented on representative PDEs (e.g., the Navier-Stokes equation and the Bateman-Burgers equation) and ERA5 (i.e., one of the largest high-resolution data sets of global-scale climate fields). These demonstrations suggest the potential of KoopmanLab to be considered in diverse applications of partial differential equations.
translated by 谷歌翻译